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ABSTRACT 


This thesis presents a statistical analysis of the 
monthly rainfall for the Monterey Peninsula and the Carmel 
Valley in Central California. The analysis begins with the 
Simple first-order autoregressive Markov model, which is 
found to be weak. Next, 2x2 contingency tables are used 
to identify predictors, one of which is found to be January 
Beeteetl, Finally, logistic analysis is used to quantify 
the predictive ability of January. 

This paper attempts to analyze rainfall time series 
in the statistical sense. No attempt is made to provide 
a@epmysical explanation of the findings from the point of 


view of a meteorologist. 
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ie ee OOUCT LON 


elude PROBLEM 

The Monterey Peninsula Water Management District, in 
Central California coastal area has as one of its responsi- 
bilities the duty to recommend and/or impose water rationing 
on its constituents. To do this in a rational way requires 
the District to have some formula for predicting future water 
availability. Although the techniques of modern meteorology 
are becoming more sophisticated and exact there is still the 
inability to make good long-range predictions. This thesis 
analyzes three series of Monterey County monthly rainfall 
data by purely statistical methodology in order to identify 


possible predictive formulas. 


B. NOTATION 

Rainfall will be denoted as Ea which will represent 
inches of rain recorded for the a year and the mt month. 
The year to be used is the California Water Year which begins 
in October and ends the following September. MThus Ri is 
Sreemonthly rainfall for October of year, 'l‘ and Reg is 
the monthly rainfall of May of year ‘'6'. 

An overstruck bar as in R.. will indicate the arithmetic 
average of a variable; in this case it is the arithmetic 


average of all years and months of rainfall. Re is the 


average of rainfall over the years for month mn; Bee 


ZC 





represents the yearly average for year t. 


See rHODS OF ANALYSIS 
Three methods were used to analyze the data. The first 
method was to model the series using autoregressive moving 
averages as described in Box and Jenkins [Ref. 1]. The 
second was to use 2x2 contingency tables to identify possible 
predictors. The third was logistic regression to quantify 
@ne findings of the 2x2 contingency table analysis. These 
three methods will be described in further sections of this 
paper. 
1. ARMA(p,q) Models 
A widely used approach to time series modeling pro- 
posed by Box and Jenkins is the ARMA(p,q) model. This model 
is actually a joining of two types of model, the autoregres- 
Sive and the moving average. 
In the noation of Box and Jenkins: 


met {27 t=1,2,...,n} be a time series, then an ARMA(p,q) 


~! 


process may be written as: rl 


Ze =O 2 tne + 92 tan Byala — + 8g at ug 


the fa Liven aresassimed £6 be random shocks 


1S , 


distributed as independent and identically distributed (11d) 


random variables with mean zero and variance o* and 


~~ 


Za = Zs =P The further assumption of normality is also 


usually made. 
For purposes of this paper, a mapping of a 


ce t.O Zier r=12(t-1)+m was made, and an ARMA analysis was 


2 





then conducted on this index transformed series. This 
analysis is described in section III. 
2. 2x2 Table Analysis 

In the validation of section IV it is found that 
the ARMA model is not very successful in describing the 
data. In section V the data is analyzed by means of 2x2 
contingency tables. These tables are good tools for explo- 
ratory data analysis in that they provide a visual display 
of the data. Statistical procedures based on the null 
hypothesis of independence can be used to quantify the 
departure from independence. The theory of 2x2 tables, 
and contingency tables in general may be found in Fleiss 
[Ref. 3], Dixon and Massey [Ref. 5], Brownlee [Ref. 6], 
and Mood, Graybill, and Boes [Ref. 7]. 

For this paper, the contingency table approach is 
used to identify a month or group of months of a year whose 
rainfall can serve as a predictor for the rainfall during 
the remaining months of the year. One predictor that was 
suggested is the rainfall in the month of January. 

fee Logistic Analysis 

Once a predictor is tentatively identified it becomes 
necessary to quantify the degree, direction and accuracy of 
the predictor. 

A logistic analysis is conducted by dividing the data 
for a year into two sets, the predictor or control set, and 


predictand or complement set. For this analysis, the predictor 


Ze 





is the logged anomaly of January rainfall for the year; 


that is, if X, denotes the predictor or control for year 


12 
t, then 
l N 
kK. = &n(R, 4) - F ) en(Ry 4) ae 
t=1 
(The logarithm is used to better symmetrize the model.) The 


complement is the raw anomaly of the total rainfall for the 


t 
immediately subsequent eleven months; that is, if vy denotes 


the complement for year t, then; 


ee 3 
_ = a seat ) R 
m=5 


1 “ “§ 5 
- —— ( R + R 
ae t=l m=5 evan ms term 


) 


Finally, the data are transformed into a binary representa- 
tion, relative to zero as; 
ae 
OL cies t 0 
ae 
Ee eee 2 0 
In section VI the model fit is 


eats x 
P (¥=1/X=x) = mot ie 
lt+e 


Where x is as before and P(Y=1|X=x) is interpreted as: 
"the conditional probability that the subsequent eleven month 
total rainfall will be above its mean, given that the logged 


anomaly of January rainfall was ‘'x' 
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It. THE DATA 


A. GENERAL 

Three data sets were used for this analyis. The location 
at which these data sets were gathered is shown in Figure l. 
As the figure indicates, two of the data sets are on the 
Monterey Peninsula proper, while the third set, SC, represents 
the Carmel River Watershed at the San Clemente Dam. 

Although data exists in all cases to the present, all 
three sets were truncated at September of 1974. The remain- 
feeecaata, Up through September of 1980 was reserved for 
validation of the models and methodology. 

The data coordinates are: 


Data set RN: 360 35' 42" North Latitude 
121” 54' 43" West Longitude 


Data set FL: 360 a5 GOs ea North Latitude 
121” 56' 30" West Longitude 


Data set SC: I55 26' 12" North Latitude 
121° 42' 30" West Longitude 
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B. DATA SET RN 

Data set RN consists of monthly rainfall amounts gathered 
Meeemoressor R.J. Renard, Cooperative Observer for the National 
Weather Service Climatological Station, Monterey, California. 
The data set begins in June 1951 and currently terminates in 
September 1980. As was stated above, the analysis was con- 
@eeeea Only on that data between and including October 1951 
and September 1974. 

1. Raw Data 

Appendix A contains a listing of data set RN. Figure 
2 shows the raw data set. Month 1 is October 1951, month 148 
is January 1964, and up to month 288 which is September 1974. 
As can be seen the data are strongly seasonal. This is enough 
to indicate that the series, as stated, is highly non- 
stationary. 

The data presented so far deals with only monthly 
data. Next to be considered is the series of yearly total 
rainfalls. The results are shown in Figure 3 (Yearly total 
rainfall), 4 (Correlogram of yearly rainfall), and Table l 
(Estimated Autocorrelations). In this case, the correlogram 
indicates stationarity and independence of the yearly series. 
A plot of the lag one relationships, Figure 5, reinforces 


this indication of independence. 
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jMeemcorrelograms and Partial Correlograms to follow 


indicate the 95% approximate significance levels using 


dashed lines. For development of these significance 


levels see Box and Jenkins [Ref. 1]. 
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Figure 3. Yearly total rainfall for data set RN 
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Figure 4. Correlogram of Yearly total rainfall 
for data set RN. 
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ESTIMATED AUTOCORRELATIONS OF YEARLY 
TOTAL RAINFALL FOR DATA SET RN 
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Figure 5. Lag one plot of yearly rainfall data 
for data set RN. 


2. Swept Data za 

Pierce [Ref. 9] and Hipel [Ref. 11] suggest various 
ways to remove the seasonality of data sets like RN, FL, and 
SC. The basic, and most straight forward of these methods 
is to remove the various monthly means. This is accomplished 
by the following replacement: 
let 

een Bem Bo Te 5 
where R. represents the mean of the month nm. 


One statistic that is a byproduct of the calculations 


@hi Re is 35 defined as the estimated variance of the 
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monthly data points: 

N 2 
) ss 2 > ; 
Ss. = nor | 2 eee Re) ke 

t=1 

These statistics for data set RN are shown in Table 2, and 

illustrated in Figure 6. In the same way as the raw data 

mapped into a series, a series is created from 

Re om as; 


oe = Rem ,r=12 (t-1)+m Ee. 


TABLE 2 


MONTHLY MEANS AND VARIANCE FOR DATA SET RN 


MON TH MEAN VARIANCE 
re noes Rea 
2 Za413 36594 
3 5 Ro i. Bree ea 
= 4.124 5.4146 
5 2.992 BEA Sih 7 
6 2059 3.7414 
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Monthly means for data set RN 
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Monthly rainfall anomalies in inches 


for data set RN 
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mere 8. Correlogram of the monthly rainfall 
anomalies for data set RN 


TABLE 3 


ESTIMATED AUTOCORRELATIONS OF MONTHLY RAINFALL 
ANOMALIES FOR DATA SET RN 


LAG VALUE LAG VALUE 
A 5 BENS, 14 5 ls) I 
Z ~.090 iS =) Leh) 
3 -.059 16 -.006 
4 ~041 ae, SOE) 
= Boos lke .004 
6 =, 035 19 FOS 
Hi ey Olle 20 ~.00/7 
8 -.043 2. J Olee 
9 = 3 Ze 2023 

10 = 5 (08/0 23 ~.066 

vy SOLIS 24 ~.044 

ae = (0-410) 2 e021 
ile =,012 
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3. Logged and Swept Data 


The data should now be stationary in the means. 
However, as seen in Table 2, the variances of monthly 
rainfall amounts are not homogeneous. Kilmartin [Ref. 10] 
discusses various transformations of the data to remove 
this heteroskedacity. <A plot of the variance versus mean, 
Figure 9 below, indicates that the logarithmic transform 


Of the data might be useful. 
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Figure 9. Plot of monthly variance against 


monthly means for data set RN 
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mance the data contain zeros, the following modified 


Hogarithmic transformation is done 


lie 


L 
R! = fn(Re a) N 


Bin én(R, +1) oa 


, TH 


t=1 


where the effect of the addition of the one is mostly to 
preserve the mapping of zeros into zeros. A more in depth 
discussion of this transformation is found in Kilmartin. 
The mapping is performed again as before and R! a and 


S! F are calculated in amanner similar to II.6 and 


shown in Table 4 and Figures 10 and ll. 


TABLE 4 


MONTHLY MEANS AND VARIANCE FOR 
LOGGED DATA SET RN 


MONTH MEAN VARIANCE 
i ~438 ~ 1549 
2 e092 ~ 3454 
3 oe ei) Sie 
= eS 39 7200;3 
2) Ps 04 oo 
6 Ege ee 5 ONS 
7 ~854 A 5 PAs: 
8 a bls 20767 
5) SLES ~0401 

EO .054 .0045 

ee .094 ~0094 

Le SNS) .0849 
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Figure 10. Monthly means of logged data set RN 
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Figure ll. Plot of monthly variance against monthly 
means for logged data set RN 
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These transformations, the logarithm followed by 
the removal of the monthly means of the logged data, result 
in the series listed in Appendix A and described in Figures 
M2 and 13 with Table 5. 

These displays indicate that a suitably stationary 
series has been obtained. Other methods, such as differencing, 
scaling, and Box-Cox transformations, see Hipel [Ref. ll], 


were tried but with less success. 
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Figure 12a. Logged anomalies of monthly 
rainfall for data set RN. Months 1 -148 
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Correlogram of logged anomalies of 


monthly rainfall from data set RN 
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Figure 13. 
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TABLE 5 


ESTIMATED AUTOCORRELATIONS OF LOGGED 
ANOMALIES OF MONTHLY RAINFALL FROM 
DATA SET RN 


AUTOCORRELATIONS 
LAG VALUE LAG VALUE 
il 6 ALS yab 14 -.033 
2 =e HO) iS -.084 
3 =~ 09) 16 ~.024 
+ 2071 IL .062 
=) 6 OE ir cy 0) lie 
6 =o las Lg ~.008 
7 -.009 20 OS 
8 = Ocke: oe Sols 
9 = BY ING = (0) S68: 
10 ~.069 a ~.024 
iat .024 24 =O 
IL -.004 Z5 AOSZ 
ie) nO SZ 


SeeDALA SET FL 

The label for these data derives from its location, 
Forest Lake, on the Monterey Peninsula, in Pebble Beach, 
Bemeirormia. Data set FL consists of monthly rainfall 
figures gathered by the California-American Water Company 
Since 1896. Although this data set started quite early, 
the data prior to 1937 has frequent missing observations. 
Therefore, this data set is taken as October 1937 through 
september 1974, with October 1974 through September 1980 
Beserved for validation. 

Analysis of this data set is identical to that of data 
set RN, therefore only the pertinent. figures and tables 


are shown. 
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Figure 14b. Months 297-444 of rainfall 
in inches for data set FL 
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Magure 15. Yearly total rainfall for data set FL 
(1937 - 1974). 
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Figure RO correlogram of yearly total rainfall 
for data set FL 


TABLE 6 


ESTIMATED nUTOCORRELATIONS OF YEARLY TOTAL RAINFALL 
FOR DATA set FL 


AUTOCORRELATION® 
LAG VALUE LAG VALUE 
L .o10 14 SO 
2 ,0ee Ike. ,105 
e Sar, 16 =o! 
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5 <6 De 13 5,0 
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2. Swept Data 


TABLE 7 


MONTHLY MEANS AND VARIANCE FOR DATA SET FL 


MONTH MEAN VARIANCE 
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Pigute l7e Monthiy means for data set FL 
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Figure 18b. Months 297-444 of rainfall anomalies 
in inches for data set FL 
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Pegure 19, Correlogram of monthly rainfall 
anomalies for data set FL 
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TABLE 8 


ESTIMATED AUTOCORRELATIONS OF MONTHLY 
RAINFALL ANOMALIES FOR DATA SET FL 


AUTOCORRELATIONS 
LAG VALUE LAG VALUE 
ik ~244 14 =.053 
2 ~.007 ES ~.043 
3 -.056 16 ~.002 
4 O24 si 5 Ola 
S) SBE 18 ~.014 
6 -.026 189 -.O1l1 
7 -.022 20 -.031 
8 ae 0/4 3h Za) mz 3 
9 -~.051 22 OST 
10 = 610210 16,8 -.007 
JJ “OTT 24 -041 
IL 2009 25 OSS, 
SIRS: 068 


3. Logged and Swept Data 
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Figure 20. Plot of monthly variance against monthly 
means for data set FL 
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TABLE 9 
MONTHLY MEANS AND VARIANCE FOR LOGGED DATA SET FL 


MONTH MEAN VARIANCE 
ul ~ 484 - 1440 
2 e007 - 3440 
3 re 210 2 se VOL 
4 IL Ses se dae, 
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Figure 21. Monthly means of logged 
data set FL 
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22. Plot of monthly variance against monthly 
means for logged data set FL 
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Figure 23a. Months 1 - 148 of logged rainfall 


anomalies for data set FL 
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Figure 23b. Months 149 - 444 of logged rainfall 
anomalies from data set FL 
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Figure 24. Correlogram of logged anomalies of monthly 
rainfall from data set FL. 


TABLE 10 


ESTIMATED AUTOCORRELATIONS OF LOGGED 
ANOMALIES FROM MONTHLY RAINFALL OF 
DATA SET FL 


AUTOCORRELATIONS ~ 
LAG VALUE LAG VALUE 
i .185 14 -.052 
2 -.020 15 -.021 
3 -.081 16 20 
4 046 117) .040 
5 .043 a: -.040 
6 -.050 19 -.025 
7 -.024 20 -.014 
8 -.010 2 .007 
9 -.027 22 .019 
10 .004 23 -.015 
aa) 50h 24 .076 
i .084 25 .047 
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See ALA SET SC 

The label for this data derives for its location, San 
Clemente Dam, on the Carmel River in Central Califronia, 
approximately 26 kilometers southeast of data sets RN and 
FL on the Monterey Peninsula. Data set SC consists of 
monthly rainfall figures gathered by the California-American 
Water Company since 1926. 

Analysis of this data set 1S again very close to that 
of the previous data sets and only the displays will be 
given. 


1. Raw Data 
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Figure 25a. Months 1 (October 1926) - 148 
(January 1938) of rainfall in inches for 
data set SC. 
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Fiugre 25b. Months 149 - 444 of rainfall 
for data set SC 
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meogure 25c. Months 445 ~ 576 of rainfall for 
data set SC 
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Figure 26. Yearly total rainfall for data 
set SC (1926 - 1974) 
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Figure 27. 
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Correlogram of yearly total 


rainfall for data set SC 


TABLE ll 


ESTIMATED AUTOCORRELATIONS OF YEARLY 
TOTAL RAINFALL FOR DATA SET SC 
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TABLE 12 


MONTHLY MEANS AND VARIANCES 
FOR DATA SET SC 


MONTH MEAN VARIANCE 
i =G9'6 -9945 
2 Pane at pe, LE 2 
3 3.940 Oy oS 
4 4.999 8.4899 
5 oD Ss 13.3443 
6 She U25(0) 9.2744 
ii Jig JOG, 3.4486 
8 ae Eo AZ 
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10 5 oly 20052 
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Figure 28. Monthly means for data set SC 
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Figure 29a. Months 1 - 296 anomalies in inches 
ror data set SC 
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Figure 29b. Months 297 - 376 anomalies in 
inches for data set SC 
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Correlogram of monthly rainfall 
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anomalies for data set SC 


TABLE 13 


ESTIMATED AUTOCORRELATIONS OF MONTHLY 
RAINFALL ANOMALIES FOR DATA SET SC 
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3. Logged and Swept Data 
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Figure 31. Plot of monthly variance against anomally 
means for data set SC. 
TABLE 14 


MONTHLY MEANS AND VARIANCES OF ve 
LOGGED DATA SET SC 


MONTH MEAN VARIANCE 
IL ~444 EOS 
2 yekeyit m5 2299 
3 1.408 . 3949 
~ oes woul ° 
5 1.444 74928 
6 ieee 5956 
7 aoe ag ~ 3247 
8 2520 g09 26 
9 092 OZ 
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az BOS = 0352 
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Figure 32. Monthly means of logged data set SC 
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Figure 33. Plot of monthly variance against monthly 
means for logged data set SC 
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Months 1 - 296 of logged rainfall 


anomalies from data set SC 


Figure 34a. 


62 





Pets 





2a | 


SSTIWWONY VIbANTHYa G39907 





444 


OF DRTR 


MONTHS 


Boel os: Peon MEWSc 





=. 934 


age 


MONTHS OF DATA 


445 


Figure 34b. 


Months 297 - 576 of logged rainfall 


anomalies from data set SC 
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Figure 35. Correlogram of logged anomalies of 
monthly rainfall from data set SC 


TABLE 15 


ESTIMATED AUTOCORRELATION OF LOGGED 
ANOMALIES OF MONTHLY RAINFALL FROM 
DATA SET SC 


AUTOCORRELATIONS 
LAG VALUE LAG VALUE 
a SOLS 14 -.057 
2 = ss) LS ae OAs 
5 -.066 16 nO PR, 
4 7038 17 aE 
5 a OMEZ 18 -.017 
6 -.061 We, 10 OOS 
"| 60:23 20 -.008 
8 -.001 Jal, =< (oI IL 
9 = 5 (USE 22 OZ 
10 -.021 23 =. 019 
Je OO 24 7050 
EZ, moo 1 Zi OWwiZ 
LS «09. 
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ieee GOR PER MARKOV MODEL 


A. THEORY 


As first shown by equation I.1, the general ARMA(p,q) 


model is: 


~e 


= - >, 44.1 Woo cals as ns 1 24-1 oc OSes ALag shit oeae 


The development and discussion of this type of model is 
contained in detail in Box and Jenkin [Ref. 1] and Nelson 
[Ref. 8]. The modeling process is a three fold procedure. 
The parts are: 

fee Ldentification 

(2) Estimation 

(3) Diagnosis. 

Identification is conducted using the correlogram and 
a plot of the partial-autocorrelations (or partial correlo- 
Meats the partial autocorrelations are related to the 
autocorrelations, see Box and Jenkins [Ref. 1], Nelson 
[Ref. 8], or Richards and Woodall [Ref. 12]. These partial 
autocorrelations are used to determine the order of the 
moving average process much like the autocorrelations may 
be used to determine the order of the auto-regressive 
process. 

Once the autocorrelations and partial autocorrelations 


have been found, the degree of the ARMA may be estimated by 
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techniques described in Box and Jenkins, Nelson or Richards 
and Woodall. Each of the data sets, once logged and swept, 
indicated that the most probable model was an ARMA(1,0) or 
AR(1) or more commonly a first-order autoregressive Markov 
model. This model is simply; 
a. ae 02,4 + ay IDs Ee 
where the op is the autocorrelation of lag one. fThus, this 
model indicates that any persistence in the data are condi- 
tionally independent of the past given the lag one value. 
Subsections B, C, and D below show this model as applied 
to the three data sets of interest. The residuals of the 


model Z are examined. The residuals appear to 


t ~ 924-3 
be independent, however, they do not appear to be normally 
distributed; for example, there is a high peak around zero. 
One possible reason for this discrepancy may be the dichotomy 
of winter and summer rain as indicated in Tables 2, 4, 7, 9, 
12, and 14. The existence of months with zero rainfall during 
the summer suggests that one should consider the summer, when 
rain 1S sparse, completely separate from the winter when rain 
is more abundant. Therefore, also shown in the subsections 
below is the autoregressive model applied to the winter months 
only. This is accomplished by stripping out months 9 through 
12 (June through September) of the data sets and treating the 
remaining data as a continuous set. In other words, the 

first ten months are then 


R R 


Mme ry gr) 5? Ry 6° ®1,8°82,1°%2,2° 


66 





The appropriate correlograms and partial correlograms are 


displayed prior to the model applications. 


B. DATA SET RN 
1. Twelve Month Series 
This data set is described in section II.b. The 
remaining diagnostic device needed is the partial correlogram 


of Figure 36 and the corresponding values in Table 16. 
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Figure 36. Partial correlogram of the logged 
rainfall anomalies of data set RN 
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TABLE 16 


ESTIMATED PARTIAL-AUTOCORRELATIONS FOR 


LOGGED 


LAG 


W MANAUS WNF 


RAINFALL ANOMALIES OF DATA SET RN 


VALUE LAG VALUE 
eo a 14 -.040 
-.144 lug, =o) 75 
-.047 16 0 OnE 
moo 2 ey, -054 
~006 lige -.067 
-.058 Ie, - 039 
~ Oey ZO) O12 
-.034 21. =_ 0 3L 
+043 22 -.043 
-.056 js Bs e013 
.048 24 -.043 
-.041 Z5 =O B 


~047 


The model of interest is then 


~ 1 
Z = . 


1S 
where the random 
iid N(0,0.) and 


—. 


N-l1 t 


ices % 


Jk 


The goodness 


Firstly, are the 


os 
1912.) + ay IEE 


shocks ta, } are assumed te pe distributed 


O° is estimated as 


~t ~t 
(Bn, = Uy, TII.4 


of this fit may be viewed in two ways. 


residuals, hoes independent? Secondly, 


are the residuals distributed as Normal (Gaussian) random 


Variables? A plot of the residuals follows in Figure 37. 


The question of independence is addressed in Figure 38 


(Correlogram), Figure 39 (Lag one plot), Figure 40 (Residuals 


vs. lag one), and Table 17 (Turning points). For a discussion 


of the usefulness of the turning points see Kendall [Ref. 14]. 
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Pol 1 of these displays and tests tend to indicate that the 
residuals are in fact serially independent. The statistics 
of the residuals are in Table 18. A Normal Plot of the 
residuals (Figure 41), in which the sample is normalized by 
removing the mean and scaling by the standard deviation and 
then plotted on normal paper, should yield a nearly straight 
line corresponding to the dashed line of the figure. The 
Normal Plot accompanied by the sample histogram (Figure 42) 
addresses the normality of these data. As may be seen from 
the kurtosis, the fluctuations of the sample CDF near the 
midpoint, and the peak of the histogram, the normality of 
this data are questionable. To confirm this a chi-squared 
Goodness of fit test was conducted yielding a value of 49.18 
with 17 degrees of freedom, again rejecting any hypothesis 


@eenormality at a significance level of 5x10. 
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Figure 37. First order Markov residuals from logged 
rainfall anomalies of data set RN. Months 149 - 292 
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Figure 37b. First order Markov residuals from 
logged rainfall anomalies of data set RN. 
Months 149 - 292 
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Figure 38. Auto correlations of residuals from 
first order Markov process applied to the logged 
rainfall anomalies of data set RN 
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Figure 39. Lag one plot of first order Markov 
residuals from logged rainfall anomalies of 
data set RN 
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Figure 40. First order Markov residuals versus 
lag one data point from logged rainfall anomalies 
of data set RN 


TABLE 17 


ACTUAL AND EXPECTED NUMBER OF TURNING POINTS 
AND ACTUAL AND EXPECTED PHASE FREQUENCIES 
Bom THE FIRST ORDER MARKOV RESIDUALS FROM 

THE LOGGED RAINFALL ANOMALIES OF DATA SET RN 


NUMBER OF TURNING POINTS = 191 
E[P] = 190.667 VIP] = 15.899 
PHASE LENGTHS 
D OBS. E[*] 
1 Ley 118.8 
2 56 52.1 
3 15 14.9 
4 3 a2 
5 0 .6 
6 0 sil 
7 0 a. 
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TABLE 18 


GENERAL STATISTICS OF FIRST ORDER 
MARKOV RESIDUALS FROM LOGGED RAINFALL 
ANOMALIES OF DATA SET RN 








Moments 
Mean -.001 
Variance ay 
Skewness -.066 
Kurtosis ~523 
Percentiles 
Minimum -1.141 
Lower Sixteenth - ,/45 
Lower Eight - .463 
Lower Quartile - .174 
Median = 014 
Upper Quartile Rez 
Upper Eight - 436 
Upper Sixteenth . 706 
Maximum 1.246 
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Figure 41. Standardized normal plot of first 
order Markov residuals from logged rainfall 
anamlies of data set RN 
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Figure 42. Histogram of first order Markov residuals 
from logged rainfall anomalies of data set RN 


2. Winter Series = 
As stated above, the number of summer months with 
zeros indicated that a look at the winter months only might 
be worthwhile. The Figures 43 (Winter months), 44 (Correlo- 
gram), and 45 (Partial correlogram), which deal only with 
winter months, still indicate a first order autoregressive 


Markov model as; 
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Figure 43. Winter months only of logged Bad ie ado 
anomalies of data set RN 
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Figure 44. Correlogram of winter months only of 
logged rainfall anomalies from data set RN 


Sal ue 
CORRELOGRAM 


PARTIAL 
AUTOCORRELATIONS 





Figure 45. Partial autocorrelations of winter 


months only logged rainfall anomalies from data 
set RN 
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Now, aS with the full twelve month model, a look 
at the residuals yields the Figures 46 (Residuals), 47 
(Correlogram), and 48 (Lag one plot), 49 (Residuals vs. 
lag one), and Table 19 (Turning points). It appears that 
the residuals are, in fact, independent. This is similar 
to the twelve month model. 

The question of the normality of the residuals is 
addressed by Table 19 and Figures 50 (Normal plot) and 51 
(Histogram). The results of these plots anda basic 
chi~squared goodness of fit of 22.11 with 17 degrees of 
freedom indicate that this winter month data set is much 
more normal than was its twelve month counterpart. This 


chi-squared value is significant at the .181 level. 
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Figure 46a. First order Markov residuals of logged 
rainfall anomalies for winter months only of data 
set RN. Years l - l2 
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Figure 46b, First order Markov residuals of logged 
rainfall anomalies for winter months only of data 
set RN. Years 12 -— 24 
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Figure 47. Correlogram of first order Markov residuals 


of logged rainfall anomalies for winter months only of 
data set RN 
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Pigure 48. Lag one plot of first order Markov residuals 
from logged rainfall anomalies for winter months only of 
data set RN 
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Figure 49. First order Markov residuals versus lag 
one data point from lagged rainfall anomalies of 
winter month only data from data set RN 


TABLE 19 


ACTUAL AND EXPECTED NUMBER OF TURNING 
POINTS AND ACTUAL AND EXPECTED PHASE 
FREQUENCIES FROM THE FIRST ORDER MARKOV 
RESIDUALS OF THE LOGGED RAINFALL ANOMALIES 
OF DATA SET RN 


NUMBER OF TURNING POINTS = 129 
E[P] = 126.667 v{P] = 15.84 
PHASE LENGTHS 

D OBS. E[*] 

1 82 78.8 

2 38 34.5 

3 7 9.9 

4 1 2h 

5 i 4 

6 0 0.0 

7 0 0.0 
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TABLE 20 


GENERAL STATISTICS OF FIRST ORDER 
MARKOV RESIDUALS FROM LOGGED RAINFALL 
ANOMALIES OF DATA SET RN 


: Moments 
Mean -.001 
Variance . Da 
Skewness -. 161 
Kurtosis ee 2k: 
Percentiles 
Maximum -1.148 
Lower Sixteenth - .845 
Lower Eight - .616 
Lower Qurtile - .368 
Median SOAS) 
Upper Quartile ooo 
Upper Eight see ae 
Upper Sixteenth Se 
Maximum iO 3 
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megare 50. Standardized normal plot of first order 
Markov residuals from logged rainfall anomalies of 
Winter months only from data set RN 
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Figure 51. Histogram of first order Markov residuals 
from logged rainfall anomalies of winter month only 
of data set RN 


meee DATA SET FL 

As in the previous section on the data sets, the analysis 
of section III.B above carries forward fairly well to data 
Meeomri, and SC. This section, and the following, contain 
only the Figures and Tables corresponding to those in the 


previous section on data set RN. 
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1. Twelve Month Series 
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Figure 52. Partial correlogram of the logged 
rainfall anomalies for data set FL 
TABLE 21 


ESTIMATED PARTIAL AUTOCORRELATIONS FOR 
LOGGED RAINFALL ANOMALIES OF DATA SET FL 


ieAG VALUE AG VALUE 
il .185 14 = S72 
2 = 057 15 won 
3 -.069 16 -.019 
4 .077 Ly AO 
5 ~O16 18 -.048 
6 -.068 19 .006 
7 .010 20 SOO 
8 -.008 mal .003 
9 -.040 ee .018 
10 S022 23 025 
ile Roe | 24 .08l 
1 .067 25 noe 
13 .055 
= 7 Bir 26 
zy = -1852,_) == ay ‘ 
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Figure 53a. First order Markov residuals from 
logged rainfall anomalies of data set FL. 
Months 1 - 296 
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fmeore 53b. First order Markov residuals from 
HWeeged rainfall anomalies of data set FL. 
Months 297 - 444, 
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Figure 54. Autocorrelations of residuals from first 
order Markov process applied to the logged rainfall 
anomalies of data set FL 
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Figure 55. Lag one plot of first order Markov 
residuals from logged rainfall anomalies of 
data set FL 
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Figure 56. First order Markov residuals versus 
lag one data points from logged rainfall anomalies 
of data set FL 
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TABLE 22 


ACTUAL AND EXPECTED NUMBER OF TURNING 
POINTS AND ACTUAL AND EXPECTED PHASE 
PwOusNete ou ROletHe FIRST ORDER MARKOV 
RESIDUALS FROM DATA SET FL 


NUMBER OF TURNING POINTS = 294 
E[P] = 294.667 V{[P] = 15.93 
PHASE LENGTHS 
D OBS. E[*] 
1 188 183.8 
2 79 80.7 
3 19 Dae 
4 6 5.0 
5 0 9 
6 i ot 
7 i 0.0 
3 0 0.0 
9 0 0.0 
10 0 0.0 
TOTALS 294 293.7 
TABLE 23 


GENERAL STATISTICS OF FIRST ORDER MARKOV 
RESIDUALS FROM LOGGED RAINFALL ANOMALIES 
OF DATA SET FL 


Moments 
Mean .000 
Variance PIES 
Skewness -.044 
Kurtosis ae 
Percentiles 
Minimum -1.276 
Lower Sixteenth - .702 
Lower Eight - .426 
Lower Quartile - .156 
Median - .012 
Upper Quartile - 184 
Upper Eight ~481 
Upper Sixteenth -648 
Maximum 1.124 


87 





1) aly Ble Oe penta 2 at | = Be 


FIRST ORDER MARKOV RESIDUALS 





Proure 57. Standardized normal plot of first order 
Markov residuals from logged rainfall anomalies of 
data set FL eee 
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Figure 58. Histogram of first order Markov residuals 
from logged rainfall anomalies of data set FL 
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This data set yielded a chi-square value of 107.66 
for 17 degrees of freedom. The significance of the value 
is in the zero plus range. 


2. Winter Series 
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megure 59a. Years 1 - 12 of winter months only of 
logged rainfall anomalies of data set FL 
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Bueaure 59b. Years 13 - 37 of winter months only, 
logged rainfall anomalies of data set L 
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Figure 60. Correlogram of winter months only, 
logged rainfall anomalies from data set FL 
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Figure 61. Partial correlogram of winter months only, 
logged rainfall anomalies from data set FL 
These displays indicate a model like 


Z = .1992 as rrr 7 
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Figure 62b. Years 25 - 37, first order Markov 
residuals of logged rainfall anomalies for 
winter months only, data set FL 
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Figure 63. Correlogram of first order Markov 
residuals of lagged rainfall anomalies from winter 


months only, data set FL 
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Figure 64. Lag one plot of first order Markov 
residuals from logged rainfall anomalies of winter 
months only, data set FL 
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TABLE 24 


ACTUAL AND EXPECTED NUMBER OF TURNING 
POINTS AND ACTUAL AND EXPECTED PHASE 
FREQUENCIES FROM THE FIRST ORDER MARKOV 
RESIDUALS OF THE LOGGED RAINFALL ANOMALIES 
OF DATA SET FL 


NUMBER OF TURNING POINTS = 209 
E[P] = 196 V([P] = 15.902 
PHASE LENGTHS 
D OBS. E[*] 
i 138 LDP Mil 
2 59 53.5 
3 8 15.4 
4 2 373 
5 1 6 
6 0 sil 
7 0 0.0 
g 0 a0 
TABLE 25 


GENERAL STATISTICS OF FIRST ORDER 
MARKOV RESIDUALS FROM LOGGED RAINFALL 
ANOMALIES OF WINTER MONTHS ONLY, 
DATA SET FL 


Moments 
Mean -000 
Variance m2 34 
Skewness -.079 
Kurtosis -.376 
; Pemconiuire S 
Minimum -1.274 
Lower Sixteenth ~ .798 
Lower Eight =. 200 
Lower Quartile - .315 
Median BO a 
Upper Quartile BS Pee 
Upper Eight atone 
Upper Sixteenth . 748 
Maximum a es is) 


95 





FIRST ORDER MARKOV RESIDUALS 


Figure 66. 
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Standardized normal plot of first 


order Markov residuals from logged rainfall 


anomalies 
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Figure 6/7 
residuals 


Of winter months only, data set FL 
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Histogram of first order Markov 
from logged rainfall anomalies of 


winter months only, data set FL 
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ane chi-squared was Calculated at 15.35 for 17 
@eececs Of freedom. This is a significance level of .570, 


thus indicating possible normality. 


fee DATA SET SC 
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Figure 68. Partial correlogram of the logged 
anomalies for data set SC 
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TABLE 26 


ESTIMATED PARTIAL AUTOCORRELATIONS 
FOR LOGGED RAINFALL ANOMALIES OF 
DATA SET SC 


LAG VALUE LAG VALUE 
ue 2096 14 SSE 
2 -.075 fips) .004 
3 =. 053 16 =p (0 7 
= .046 iy ON) 
) =. 004 18 oe Oe A 
6 =. 061 AE) .004 
i .042 20 => Wee 
8 -.016 Ja = O)8 
g = (EI. 22 wLOG 

10 =O 0 ZS -.046 

11 =. ONO) IE 24 50S; 

eZ STS: 23 = 5 (LS 

ILS -084 


This information yields the model as 
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Figure 69a. First order Markov residuals from logged 
rainfall anomalies of data set SC. Months 1 - 148. 
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Figure 69b. 
Months 149 - 444 
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Figure 69c. First order Markov residual from logged 
rainfall anomalies of data set SC. Months 445 - 596. 
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Figure 70. Autocorrelations of residuals from first 
order Markov process applied to the logged rainfall 
anomalies of data set SC 
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Figure 71. lLag one plot of first order Markov 
residuals from logged rainfall anomalies of 
data set SC 
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Figure 72. First order Markov residuals versus 
lag one data points from logged rainfall anomalies 


of data set SC 
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TABLE 27 


ACTUAL AND EXPECTED NUMBER OF TURNING 
POINTS AND ACTUAL AND EXPECTED PHASE 
PebOoUue Nel Ser ROM THE FERST ORDER MARKOV 
RESIDUALS OF DATA SET SC 


NUMBER OF TURNING POINTS = 367 

E[P] = 382.667 V{[P] = 15.9 
PHASE LENGTHS 

D OBS. E[*] 

it 226 238.8 

2 95 104.9 

3 34 30.1 

4 8 6.6 

5 2 1.2 

6 l 2 

7 il 0.0 

8 0 0.0 

g 0 0.0 

10 0 0.0 

TOTALS 367 381.7 

TABLE 28 


GENERAL STATISTICS OF FIRST ORDER 
MARKOV RESIDUALS FROM LOGGED 
RAINFALL ANOMALIES OF DATA SET SC 


Moments 
Mean 2000 
Variance 209 
Skewness e057 
Kurtosis 5 (e110) 3) 
Percentiles 
Minimum afl ree A ie 
Lower Sixteenth - .744 
Lower Eight - .462 
Lower Quartile = 9205 
Median - .030 
Upper Quartile nO 7 
Upper Eight 506 
Upper Sixteenth Pr eEs: 
Maximum lea 4 
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Figure 73. Standardized normal plot of first order 
Markov residuals from logged rainfall anomalies of 
data set SC 
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Figure 74. Histogram of first order Markov residuals 
from logged rainfall anomalies of data set SC 
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This data set yielded a chi-square value of 273.95 


Mme cegrees of freedom. This 1s equivalent to a signif-~ 


icance of zero plus. 


2. Winter Series 
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Figure 75a. Years 1 - 12 of winter months only, 
logged rainfall anomalies of data set SC. 
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Figure 75b. Years 13 - 36 of winter months only, 
logged rainfall anomalies of data set SC 
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Figure 75c. Years 37 - 48 of winter months 
only, logged rainfall anomalies of data set SC 
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Figure 76. Correlogram of winter months only, 
logged rainfall anomalies from data set SC 
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Figure 77. Partial correlogram of winter months 
only, logged rainfall anomalies from data set SC 


This information indicates the model 
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Figure 78a. Years - 24 first order Markov residuals 
of logged rainfall anomalies, for winter months only, 


data set SC 
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Figure 78b. Years 25 - 48, first order Markov 
residuals of logged rainfall anomalies for winter 
months only, data set SC 
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Pagure 79. Correlogram of first order Markov residuals 


of logged rainfall anomalies from winter months only, 
data set SC. 
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Figure 80. Lag one plot of first order Markov residuals 


from logged rainfall anomalies of winter months only, 
data set SC 
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Figure 81. First order Markov residuals versus lag 
one data point from logged rainfall anomalies of 
winter months only, data set SC 


TABLE 29 


ACTUAL AND EXPECTED NUMBER OF TURNING POINTS 
AND ACTUAL AND EXPECTED PHASE FREQUENCIES FROM 
THE FIRST ORDER MARKOV RESIDUALS OF THE 
LOGGED RAINFALL ANOMALIES OF THE WINTER MONTHS 
ONLY, DATA SET SC 


NUMBER OF TURNING POINTS = 32 
E[P] = 30.667 V[P] = 15.39 
PHASE LENGTHS 
D OBS. E[*] 
1 21 18.8 
2 g cae 
3 3 DS 
4 0 5 
5 0 0.0 
6 0 0.0 
7 0 0.0 
3 0 0.0 
9 0 0.0 
10 0 0.0 
TOTALS 32 29.7 


ve 





TABLE 30 


CENMRAb plat ommcs OF FIRST ORDER 
MARKOV RESIDUALS FROM LOGGED 
RAINFALL ANOMALIES OF WINTER MONTHS 
ONLY, DATA SET SC 


Moments 
Mean .000 
Variance so05 
Skewness OES 
Kurtosis -. 363 
Percentiles 
Minimum — il syle: 
Lower Sixteenth - .872 
Lower Eight =o 3 
Lower Quartile - .359 
Median - .026 
Upper Quartile DD 
Upper Eight iets! 74 
Upper Sixteenth . 882 
Maximum Peooc 
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Figure 82. Standardized normal plot of first order Markov 
residuals from logged rainfall anomalies of winter months 
Oniy, data set SC 
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Pigure 83. Histogram of first order Markov residuals 
from logged rainfall anomalies of winter months only, 
data set SC 

The chi-square was calculated at 16.60 for 17 


degrees of freedom. This is significant at the .482 


level thus indicating probable normality. 
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ive) 6 VEULDATION OF FIRST ORDER MARKOV MODELS 


| 6LHEORY 
The general model proposed by a first order Markov 
process is, as stated before; 


~~ ~~ 


zy = pz, + a, IESE AE 


where ta, } aremcdistaibuted tid N00, o*). To validate 
this model, preferably independent data should be subjected 
to the model, and an analysis of the residual, or forecast 
errors, made. 

As stated previously, years 1975 through 1980 were 
reserved for the purpose of validation. The method of 
validation was to use the model to construct a series of 
one step ahead forecasts. Let e, (1) be the error ina 
forecast of time t+l from the model at time t. Then the 


minimum mean squared error forecast (see Box and Jenkins) is; 


e, (1) = Z - ea ENee 

If the model is correct, the sequence 1S (cb) I will be 
independent normally distributed with mean zero and variance 
- In the following sections the models are applied to the 
reserved data sets (which may also be found in the appendixes), 
and these forecast errors are calculated. The forecast errors 
are then analyzed to determine if 


(1) The errors are serially independent 
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(2) The errors are distributed as normal random 


variables with mean zero, and variance a, 


Since the residual analysis of the twelve month model 


already indicates a poor fit, the twelve month model will 
not be validated. Only the winter month models will be 


checked for validity. 


B. DATA SET RN 

Figures 84 (Raw data) and 85 (Logged anomalies) display 
the reserved data set. The logged anomalies were formed by 
removing the means of the analyzed data, Table 4, not the 
means of the logged reserved data. This was done to remove 


any bias from the validation. 
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Figure 84. Reserved rainfall data for data set RN 
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Figure 85. Logged rainfall anomalies of reserved data set RN 


The forecast errors, Figure 86, their correlogranm, 
C7) 


Figure 87, and independence tests, Table 31 indicate that 


the errors are indeed independent. 
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Mmegure 86. Forecast errors from first order Markov 
model applied to winter months of reserved data set RN 
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Pigure 87. Correlogram of forecast errors from first 
order Markov model applied to winter months of reserved 
data set RN 
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TAB Trees 1 


ACTUAL AND EXPECTED NUMBER OF TURNING 
POINTS AND ACTUAL AND EXPECTED PHASE 
Poe OuUnRNCkwos OR THE FORECAST ERRORS OF 
THE FIRST ORDER MARKOV MODEL APPLIED 
TO THE WINTER MONTHS OF RESERVED DATA 


SET RN 
NUMBER OF TURNING POINTS = 32 
E[P] = 30.667 vV[P] = 15.39 
PHASE LENGTHS 
D OBS. E[*] 
l Za 18.8 
2 8 cee 
3 3 Dae 
4 0 5 
5 0 a6 
6 0 0.0 
7 0 0.0 
g 0 0.0 
9 0 0.0 
10 0 0.0 
TOTALS 32 2G) 


The normality of the forecast errors is addressed by 
Maple 32 (Statistics), Figure 88 (Normal plot), 89 (Histogram), 
and a simple chi-squared test. The chi-squared was calculated 
as 7.82 with 5 degrees of freedom which is significant at the 
.167 level. However, the normality of the errors is somewhat 


questionable due to the other displays. 
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GENERAL STATISTICS OF FORECAST ERRORS FROM 
THE FIRST ORDER MARKOV MODEL APPLIED TO THE 
WINTER MONTHS OF RESERVED DATA SET RN 


Moments 
Mean OOF: 
Variance 259 
Skewness = 2 6 
Kurtosis -.955 
Percentiles 
Minimum -1.156 
Lower Sixteenth - .791 
Lower Eight - .634 
Lower Quartile =e ee OE 
Median -069 
Upper Quartile 20), 
Upper Eight Bay, 
Upper Sixteenth ~614 
Maximum ~855 
STANDAPDIZTED NCFMAL PLOT 
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Figure 88. Standardized normal plot of forecast errors 
from the first order Markov model applied to the winter 
months of reserved data set RN 
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Figure 89. Histogram of forecast errors from the 
first order Markov model applied to the winter 
months of reserved data set RN 
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Se DATA SET FL 
As before, the similarity of results for the different 
data set allows the analysis to be portrayed using the 


displays only. 
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Figure 90. Reserved rainfall data for data set FL 
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Figure 91. Logged rainfall anomalies of 
reserved data set FL. 


DATA 1S Seon Pear rN 
~24 


FORECAST ERRORS 





=1.03 1 


WINTER MONTHS CF DATA 


Figure 92. Forecast errors from the first order 
Markov model applied to the winter months of 
reserved data set FL 


U22 





CORRELUGRHAM 





AUTOCOGRRELATIONS ,. 
Cc) 


LAG 


Figure 93. Correlogram of forecast errors from first 
order Markov model applied to the winter months of 
reserved data set FL 


TABLE 33 


ACTUAL AND EXPECTED NUMBER OF TURNING POINTS AND ACTUAL 

AND EXPECTED PHASE FREQUENCIES FROM THE FORECAST ERRORS 

Of THE FIRST ORDER MARKOV MODEL APPLIED TO THE WINTER 
MONTHS OF RESERVED DATA SET FL 


NUMBER OF TURNING POINTS = 34 


E[P] = 30.667 V[P] = 15.396 
PHASE LENGTHS 
D OBS. E[*] 
ak 24 18.8 
2 8 sree : 
3 2 D8 
4 0 £5 
5 0 0.0 
6 0 0.0 
7 0 0.0 
8 0 0.0 
9 0 ORO 
10 0 0.0 
TOTALS 34 29.7 
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TABLE 34 


Celpnatieo lar bot ies OF PORECAST ERRORS 
FROM THE FIRST ORDER MARKOV MODEL APPLIED 
TO THE WINTER MONTHS OF RESERVED DATA SET FL 


Moments 
Mean -.015 
Varlance eZ 
Skewness -.155 
Kurtosis -.970 
Percentiles 
Minimum -1.031 
Lower Sixteenth - .724 
Lower Eight =e 49 
Lower Quartile - .396 
Median .049 
Upper Quartile a5), 
Upper Eight ~479 
Upper Sixteenth goo 
Maximum 904 
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Figure 94. Standardized normal plot of forecast 
errors from the first order Markov model applied 
to the winter months of reserved data set FL 
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Figure 95. Histogram of froecast errors from the 
first order Markov model applied to the winter 
months of reserved data set FL 


The chi-squared statistic was calculated as 12.58 
with 7 degrees of freedom, thus yielding a significance 
level of 0.083. This statistic and the displays imply 


that the data are only marginally normal if at all. 
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Figure 96. Reserved rainfall for data set SC 
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Figure 97. Logged anomalies of reserved data set SC 
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Figure 98. Forecast errors from first order 
Markov model applied to the winter months of 
reserved data set SC 
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Poawimemo2-ecorrelogram of forecast errors 
from first order Markov model applied to the 
winter months of reserved data set SC 
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TABLE 35 
ACTUAL AND EXPECTED NUMBER OF TURNING POINTS AND 
ACTUAL AND EXPECTED PHASE FREQUENCES FROM THE 


FORECAST ERRORS OF THE FIRST ORDER MARKOV MODEL 
feeote D> TO THE WINTER MONTHS OF RESERVED DATA SET SC 


NUMBER OF TURNING POINTS = 34 
E[P] = 30.667 V[P] = 15.396 


PHASE LENGTHS 


D OBS. E[*] 
in 24 1B 8 
2 g 26 AL 
3 2 2.3 
4 0 25 
5 0 0.0 
6 0 0.0 
7 0 0.0 

TOTALS 34 BAG 
TABLE 36 


CENERAL STATISTICS OF FORECAST ERRORS 
FROM THE FIRST ORDER MARKOV MODEL APPLIED 
Bonita WINTER MONTHS OF RESERVED DATA SET SC 


Moments 
Mean -.003 
Variance - 296 
Skewness -.298 
Kurtosis -.413 
Percentiles 
Minimum -1.386 
Lower Sixteenth - .846 
Lower Eight EG eal, 
Lower Quartile - .434 
Median S[eyeue 
Upper Quartile ~414 
Upper Eight 25606 
Upper Sixteenth 135 
Maximum in0'0's 
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megare 100. Standardized normal plot of forecast 
errors from the first order Markov model applied to 
the winter months of reserved data set SC 
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Figure 101. Histogram of forecast errors from the 
first order Markov model applied to the winter 
months of reserved data set SC 
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mre chi-squared statistic was calculated as 9.71 with 
7 degrees of freedom, thus yielding a significance level 


on .205. 


E. CONCLUSIONS 

The application of a Markovian model was indicated by 
the apparent dependence of adjacent months and the apparent 
lack of dependence at any other lag. The preceeding sub- 
sections, however, indicate that the first order Markovian 
model is weak at best. 

The structure of the data, visually, still points 
toward some sort of underlying order. The following 


sections attempt to discover this order. 
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Va 2x2 TABLES 


oe CoOEORY 

As seen in sections III and IV, the classical ARMA time 
series approach does not seem to adequately describe the 
data. Another technique used to explore possible relation- 
ships is the 2x2 contingency tables. 

The idea to be explored is whether or not some subset of 
the data, to be called the control, may be used to predict 
in some way the behavior of another subset of the data, to 
be called the complement. Here, the data are reduced from 
monthly observations to yearly observations as described 
below. 

awe o cna SUDSet Of a year, to be called the control, 
Meee t +: be the subset to be called the complement. It is 
Memeeeeery that XY = @; that is, the intersection of anaes 
two sets is empty. The data are then compared for some 
Matty in X and for some quality in Y. The question is 
mem: Coes the presence (or absence) of the quality in xX 
Meeeeestne presence (or absence) of the quality in Y? An 


example of a typical table is shown below in Figure 102. 
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mecure O02. Dfypical 2x2 contingency table 


— 


The table elements, Nose represents the number of years 
which display quality i in the control and quality j in 
the complement. The marginal entries ns. and n.. represents 
the numbers of years for which the control has quality i and 
the number of years the complement Hels quality j respectively. 
The overall number of years, n.., 1s in the lower right of 
the table. 

Brownlee [Ref. 6] contains a very good discussion of the 


theory and use of 2x2 contingency tables. Using the notation 


of Brownlee, let co be the probability that any given year 
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fore Maye a control quality i anda complement quality j. 


Then estimates of the ogg e are 
Tae = Ag ay Bho 
Os = n, /N.. aoa 
6.. =n../n 
J 5/ 


If the control and complement are independent, 


= Gy morte ier 


These simple assumptions allow for a thorough investigation 
of the possible interrelationships within the data sets. 
Another way to view the assumption of independence is 
etougn the use of proportions. Thus, if the basic division 
is made via the quality of the control, the the proportions 


Py = N44/1- (respectively PB. = N5,/N5-) Vas 


represent, in words, the proportion of the years that have 
quality 1 in the control and have quality 1 (respectively 
2) in the complement. 

The question of independence may be approached in several 
ways as described below. 
1. Fishers Exact Test 

A test for the significance of any dependence was 

proposed by Fisher in the case in which the marginal totals, 


ae and n.. are known a priori (cf. Brownlee [Ref. 6]). 


ii aa 
To draw from Brownlee, knowledge of the marginals and Nai 


gives knowledge of all the other elements of the table. The 


pS 





Mepdadility of the event of having exactly Nj, years that 


display quality 1 in both the complement and the control is; 


= Mae Tee na, | 
P(N, j=n,,) ile 1 Days 


MN..'N,,'Nj,5!No)!N5>! 

A test may then be applied, using V.4, to determine the 
Significance of any dependence. This test is usually applied 
by simply summing these probabilities in the tail of the 
distribution (V.4) in the same direction as the noted extreme. 

The usual procedure to provide a two-sided test of 
Significance is to double a one sided figure. This procedure 
is acceptable due to the symmetric appearance of the distri- 
bution. 


Under the assumptions of independence 


| inh As jak 
= s es pteaea | Vines 
2 Maa] 7 
Vv E - N,N. N5-n.2 V.6 
le  — 
mite (lle. = 1.) 


and the random variable U defined as 


N,, -E[N 


uae 11! 


VvONyy) 


is asymptotically distributed as a normal random variable 
with mean zero and variance one. This asymptotic result 


combined with a continuity correction yields a test statistic 
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of 


The statistic U' may then be used as a test, using 
standard normal tables, of the significance of any variation 
from the assumption of independence. It should be noted at 
this point that if the random variable U' is squared, me 
will be distributed as a chi-square with one degree of freedom. 


The squaring of U' with simplifying algebra yields the Yates 


Correction to a standard chi-squared goodness of fit statistic 


Z 
2 (In), N55 -hyo M5, |[-n--/2}" n.. v7.9 
CE oie on oi 29) 2546229) 
see Dixon and Massey [Ref. 5]. This allows the use of the 


chi-square tables as an equivalent test to that of V.8. 
2. Odds : 
Subsection V.A.1 above deals with the significance 
of any observed interdepedence between the control and the 
complement. The question of the degree of dependence should 
also be addressed. The measure to be used is the odds ratio. 
Using the notation of Fleiss [Ref. 3], a measure of seeing 


Muabity 1 in the complement ¥ may be 


9 = P(¥=1| x=1) . 


17> p(ys2;-x-1) ve 


this is then the odds that quality 1 will occur in the comple- 


ment given that quality 1 is present in the control. Ina 


iS 3s) 





Similar manner; 


— P(Yy=1| x=2) 
‘to = p(v=2] x=2 oes 


is the odds that quality 1 will occur in the complement given 
that quality 2 was observed in the control. The currently 
most often used measure is the odds ratio w, or 


at, ey 


W = T 
Note that, if the appearance of quality 1 in the complement 
is independent of whether or not it appears in the control, 
then w = 1. While w>l implies that the odds of the 
complement having quality 1, given that quality 1 was observed 
in the control, are greater than the odds of the complement 
mayang quality 2. This would indicate that the control would 
be some sort of predictor for the complement, relative to the 
selected qualities. 

Gaeene Same continuity correcting Spirit, as was used 
with the Yates chi-square, an estimate for w may be obtained 


from the Table as 


ro = (nj4+-5) (n55t-5) 
(n..+.5) (n._+.5) v.13 
1 Des 


With a standard error of 


Ie iL 1 uf 
s.e.(0)=0 ee ChE ee eel 
1 +.5 Noi4 5 + Noot-S 


mie natural logarithm of this odds ratio will be discussed 


more fully in section VI. 
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B. ANALYSIS 

The theory of subsection A above is applied to the three 
data sets as discussed below. The control is typically taken 
as a monthly anomaly, say October. Here, the quality is taken 
as either a positive or a negative anomaly. Thus xX=l occurs 
when the month of October falls below its mean and xX=2 occurs 
when it falls above its mean. The complement consists of the 
Pemeor the rainfall for the succeeding eleven months, or in 
symbols: 


Ay = Re = R,- 


N 12 
ow) Ve V.15 


Ll m= 


KK 
I 
29) 
\ 
Alt 


ep 1am ae 


=2 t 


Where it 1s understood that xX=l when X,< O, xX=2 when 


fmeerO and similarly for Y 


1 ea 


Various control subsets are used; October through 
September were investigated by themselves as were all . 
adjacent pairs, triples, and four-tuples of months. For 


an example, consider the spring (April, May, and June) and 


its complement (July through March). In this case 


2) N-1l 9 

c : wey St ym , A es Rem Voi 
eZ 1 Nolea2 6 

*t i wea Rtem i ae, Retl,m ~ N-1 22 Rem oe Realm) 


Equations V.15 and V.16 imply that the data are always 
analyzed as deviations from the arithmetic mean. However, 


the data are also analyzed as deviations from the median 


Se 





and the lower quartile. In the tables to follow, 'A' refers 
to both control and complement having the arithmetic mean 
removed, 'M' refers to both control and complement having 
their respective medians removed, and 'QL' refers to the 
control having the lower quartile removed while the median 
was removed from the complement. 

The first four Tables (37 through 40), give the signifi- 
cance levels of observed departures from independence of the 
control and complement. Only those values having a Yates 
corrected chi-square of greater than 1.00 are listed. The 
entries represent the two-tailed probability of a random 
deviation in excess of that observed. AltHough the cut off 
criterion was the Yates chi-square, the agreement between 
its probability and that obtained from the Fisher exact and 


Memmal tests did not differ in the first two decimal places. 


TABLE 37 


SIGNIFICANCE OF OBSERVED DEPARTURES FROM INDEPENDENCE OF 
SINGLE MONTH CONTROL VERSUS SUCCEEDING ELEVEN 
MONTH COMPLEMENTS 


Data set RN FL Sc 
Differentiator A M OL A M OL A M OL 

@entrol 

October PEGE we 24 poe 5 

November ee esl 2 P26 

December 

January Widmer 2 fo 2002. .04 .12 .21 .19 

February ean 

March. coe ie 

April dite: 

May ~ 14 

June 24 

July ok oe 

August 

september 54 22 
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TABLE 38 


SUGNIPICANCE OF OBSERVED DEPARTURES FROM INDEPENDENCE 
Gre eAlRS OF MONTHS VERSUS SUCCEEDING 
TEN MONTH COMPLEMENTS 


Data Set RN FL SC 
Differentiators A M OL; A M OL A M OL 


Bentcrol 

Oct+Nov 5 de 09,5 324 

Nov+Dec nee ee ee Se 08 
Jan+Feb SUF 2 30 HOO elo (LE 9 
Feb+Mar Aditi 
Mart+Apr 

Apr+May 

Mayt+Jun ota .i2 2.24 

Jun+Jul 

Jult+Aug BAL, Aglige. 

Aug+Sep 


Aisdsyindy oS}, 


SIGNIFICANCE OF OBSERVED DEPARTURES FROM INDEPENDENCE 
OF TRIPLES OF MONTHS VERSUS SUCCEEDING 
NINE-MONTH COMPLEMENTS 


Data Set RN FL SC 
Differentiators rN M Oils A M OL 1 M One 


montcrol 

Oct+Nov+Dec ~19 
Nov+DectJan 

DectJant+Feb : AeA 

Jan+Feb+Mar 

Feb+Mar+Apr ~31 .28 
Mar+Aprt+May ~22 
Apr+May+Jun Ss Si10) 

May+Jun+Jul ia. 30 

Jun+Jul+Aug pak, 
Jul+Aug+Sep 


39 





TABLE 40 


SIGNIFICANCE OF OBSERVED DEPARTURES FROM 
INDEPENDENCE OF FOUR=-TUPLES OF MONTHS VERSUS 
SUCCEEDING EIGHT=-MONTH COMPLEMENTS 


Data set RN Ei pe 
Differentiator A M OL, A M OL A M Oi 


eontrol 
Oct+Nov+Dec+Jan 2.0 
Nov+Dec+Jan+Feb we 
Dec+Jan+Feb+Mar Bay 
Jan+Feb+MartApr oe 
Feb+Mart+tApr+May .22 ae 
Mar+Apr+May+Jun 5 OS 
Apr+May+Jun+Jul SY) TA 
May+Jun+JultAug 14 
Jun+Jul+tAug+Sep 

Several choices for predictors are suggested in the 
previous tables. However, the apparent strongest candidate 
for a predictor is January. The control of January Dy 
itself and January paired with December, are the most 
consistently significant entries. Tables 41 below gives 


the odds ratio, V.13, for January, and January and December, 


as controls. 
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TABLE 41 


ODDS RATIO OF JANUARY VERSUS FEBRUARY 
THROUGH DECEMBER AND JANUARY PLUS 
DECEMBER VERSUS FEBRUARY THROUGH NOVEMBER 


Differentiator A M Oi 
Data set RN 
January 4.59 S95 = aie Mes 
Jan+Dec Spas 515 JES I OWIE 


Data set FL 


January Oe 4.72 30 
Jan+Dec fea 8) 5 OY 4.30 


Baca set SC 
January 2.45 2.49 22 3 
Jan+Dec sya i}'8 Zea 1.44 
At this point in the analysis it was decided to explore 
more fully the power of January as a predictor. It should 
be stated that other possibilities for predictors are 
Suggested by the tables, but time did not allow an exhaustive 


study of all of these. 


fee OLHER RESULTS 

The results of section V.B suggest that a more detailed 
analysis of January as a predictor is in order. The first 
method tried for this was ordinary least squares regression 
of the rainfall total in January versus the total for February 
through December. This is the model below. 


met Xx 


I 
8 


iC t,4 


K 
Il 
29 
a 
“J 





then assume that 


vy = at BX, + ey V.18 


as the standard, linear model where te,} are assumed to be 
independent and identically distributed with mean zero and 
mmance oe’. If the predictability of January is strong, 
this model, V.18, may result in a good fit of the data. 
Table 42 below is the resulting ANOVA for this regression. 
As may easily be seen, the model does not appear to have any 
Poni ficance. 

TABLE 42 


ANOVA FOR REGRESSION OF SIMPLE 
LINEAR MODEL FOR ALL DATA SETS 


R-squared = .1033 


Standard error of estimate = 4.1574 RN 
SOU RCE DF ss MS F 
Total Ze 404.798 
Regression ih 41.821 A382) Zee 
Jan-Control iL VOT sail 41.821 Za, 
Residual Zi S62.97 7 2 35 
Variable Coefficient Standard error AN 
Alpha eos ors aS 
Beta 5993 e355 1.56 
90% Confidence Limits 
Lower limit Upper limit 
Alpha -1.1817 64 
Beta - .0148 deans 3 
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R-squared = .2066 


Standard error of estimate = 4.2804 FL 
AOV 
SOURCE DF aS MS F 
Total a5 1S 5 O6 
Regression il GZ eo2 Ih A Fas 2 Si SO 
Jan-Control 16 ho ee Oe iG 225 2 8.86 
Residual 34 622.945 Nero 2 
Variable Coefficient Standard error aE 
Alpha -.2670 5 =45 7 
Beta 1.0095 ~ 339 2.98 
90% Confidence Limits 
Lower limit Wpeer Limit 
Alpha - .2670 aoa 
Beta i009 5 ies 417 


R-squared = .0426 


Standard error of estimate = 6.1508 SC 
AOV 
SOURCE DF SS MS F 
Total 46 P77 oes | 
Regression 1 1 > Ood (Sa Cites g 2-010 
Jan-Control il FS) esl i 7 Sos 7 2 0iCae 
Residual 45 702.444 37 852 
Variable Coefficient Standard error 7 
Alpha -.2771 5 eels =r 
Beta 2455 mou Hee: 
90% Confidence Limits 
Lower limit Upper limit 
Alpha -1.6766 1 eo 
Beta - .0446 5 9a5 7 


The same model with means removed, V.19, below was tried 
and, although slightly better, is still not strong. 


Y _-¥. = Qt B(X, -X. ) + Ey V.19 
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TABLE 43 


ANOVA FOR REGRESSION OF SIMPLE LINEAR 
MODEL WITH MEANS REMOVED FOR ALL DATA SETS 


R-squared = .1033 


Standard error of estimate = 4.1575 RN 
AOV 
SOURCE DF SS MS F 
Total 22 404.798 
Regression aL fheo2 a de eeoeag E 2ea2 
Jan-Control al 4ieezczk . aac? I oa 
Residual Dae 362.977 ic oD 
Variable Coefficient Standard error Ay 
Alpha US Ee Na Iss sil 6233 
Beta 5993 BSS ie sye 
90% Confidence Limits 
Lower limit Upper limit 
Alpha ce 36 132248 3 
-.0148 12133 


R-squared = .1747 


Standard error of estimate = 4.4276 Ee 
AOV 
SOURCE DF SS MS ig 
Total SiS 807.618 
Regression iL 141.108 141.108 doc 
Jan-Control i} 141.108 Leib eT! 7220 
Residual 34 666.510 19.603 
Variable Coefficient Standard error At 
Alpha 10.9860 en he 7aG2 
Beta .9414 eloyl 2.68 
90% Confidence Limits 
Lower limit Upper limit 
Alpha 8.7236 eee? 4 Ors 
Beta - 3909 1.4920 
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R-squared = .5617 Sc 


Standard error of estimate = .7589 
AOV 
SOURCE DF oS MS 
Total 7 7.884 
Regression di 4.428 4.428 
Jan-Control 1b 4.428 4.428 
Residual 6 3.456 .576 
Variable Coefficient Standard error 
Alpha - .1946 ee OL: 
Beta 1 PF ALO. 7o3! 
90% Confidence Limits 
Lower limit Upper limit 
Alpha -. 7040 - 3148 
Beta -6362 ee Oa 


The amount of rainfall in January does not appear to be 


a strong predictor for the amount of rainfall in February 


maceugh December. This seems to indicate that the relation- 


Ships between January rainfall and rainfall during the next 


eleven months is not as strong as expected. However, a 


further technique is available, that of log-odds and logistic 


megression, which are the subjects of the next section. 
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Val eeeOGlolme ANALYSIS 


fee 6 LHEORY 


The logistic analysis to be described in the section was 


developed from Gaver [Ref. 2] and Fleiss [Ref. 3]. This 


analysis derives from the model of I.5 as stated in the 


Amecroauction. 


The basic approach is to view the complement as having a 


binary representation, with success being defined as a comple- 


ment above its mean (see equations I.3 and I.4) and failure 


as the complement below its mean. The problem then is to find 


the conditional probability of a success (the complement being 


above its mean for a year) given that the control (January 


rainfall) takes on a particular value. 


The control is now taken to be the logged rainfall anomaly 


of January and is found in equation I.2 repeated below. 


te 4? 


a| br 
ir 


= an(Ry 4) = fn (Ry 


t=1 


If the probability of success, given 
P (Success |X, ) =o 
M@esuperficially attractive model for 


Oo, = a*+té Ay + ey 


This model has two difficulties, 


Vis 


x, is written as 


Vides? 


ViasS 


the worst of which is 


that probabilities of greater than one or less that zero are 
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allowed. Secondly, the Oy are not available in proportion 


femem tO fit the model. 

Initially, the problem of estimating O. 1S approached 
by grouping the data. Section ITI Pati caced that each year 
seemed to be independent of the next and that there was no 
trend. This allows for the ordering of the X's into their 
order statistics Xp) 8: Once the ordering has been done, 
non overlapping groups of arbitrary size may be formed as 
Shown. Let Ky Xo1Xar-0+1X19 be a series of 12 years from an 
arbitrary data set, with associated order astatistics 
X11) 7% 9) 7% 3) 702 X79) ° The if groups of size three are 


desired, the data are partitioned below. 


Keay Xa) 1% (3) 1X (ay (5) 1% (6) Fee |X (a0) Xa) (22) 


Given these groups, let _ be a measure of location for 


the 4h group. This analysis used the median, therefore 


K. = M3441)" Also associated with each group is R. (niet 
rainfall), the number of success in group j, and Ra, the 
number of elements in group j. From this set up, the required 
probabilities may be estimated as; 
6. = R./n. é Vad 
3 5 eee 


A solution to the first problem, that of the model yield- 
ing probabilities outside of (0,1), is to use the log odds, 


instead of . where; 
6. 


Log odds = ¢.=2n —— Vai 
i} Poe 
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which 1s equivalent to the logarithm of the odds ratio as 
given in V.13. Gaver [Ref. 2] suggests that a correction of 
-5 be applied to guard against the problem of 0 and 1 within 
the logarithm and to reduce the bias. The statistic then 
becomes; 


steve 
do: = &n 5 : VI.6 


J T.5-0. 


The temptation is to go directly to the model 


a= Oo + X.; ; 
5 B ‘ Vile 7 


yet are. | = he a which is not constant. This 
suggests the need for a weighting scheme. 

The weighting scheme used was that of iteratively 
reweighted least squares, (IRWLS), using the bi-weights. 
This method is dicussed in detail in Mosteller and Tukey 
her. 13]. 

enough grouping of the data and the model of VI.7 | 
provide adequate representation of the underlying structure, 
the logistic model itself, I.5, when viewed through the eyes 
of maximum likelihood theory can yield more insight. 


The model is assumed to be 
QotB xX, 
Q Sa 
1e 1+ eta BX, Wels 


where the x, are independent. 
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mae likelihood function is then 


men, Y70.,8) 


N 

Y ee 
al Qtt BX, e 1 e 
t=1 7 RCs ss 


N 
= [T ehh, FBX, Y, 


N We 
t=1 TT (1te%t8*,) 
t=1 
and the log-likelihood is 
} ; 
Mewera,8) =~ a) Y, + a Geo 
t=1 © t=1 © * 
N V.10 
t=1 
The gradient, and Hessian of VI.9 are 
Msp t ter VE te 
N N 
OL y w 
——- = a, Y, - Xx = 
ae t=1 es ee =1 (. + Vy 
and ev. ea 
N N x wv 
=) Ye 5 = ene 
eel +, ) eal (anes v,) 
H = 
iB N N 2 
= Ay vy - ) = Ve > 
t=l (1+ W,) 2 t=1 (1+p,) 
where 


_ _arB X 
vy =e oe 
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A simple way to solve for a and 8 is to use Newtons 


Method as; 


Ba Qo wee 
B/k+1 B/ k 


Since all necessary elements may be calculated in one pass of 
the computor algorithm. 

One beneficial byproduct of the maximum likelihood 
approach is the asymptotic information matrix, ae Gaver 
[Ref. 2] states that the diagonal elements of this matrix 


provide good estimates of Via] and V[B] under assumptions 


Seenormality. 


B. ANALYSIS 

1. Grouped Data 

The first approach taken was that of grouping the 

data as described above. Groups of 3, 4, and 5 were used, 
aS were two separate methods of regression, ordinary least 
squares (OLS), and iteratively reweighted least squares 
(IRWLS). Tables 44 (RN), 45 (FL), and 46 (SC) present the 
data and Tables 47 (OLS), and 48 (IRWLS) present the results 


of the regressions. 


Lee 





WOMOUYNANUABWNHE Cc 


TABLE 44 


DATA SET RN, LOGGED JANUARY ANOMALIES AND 
SUCCESSES FOR GROUPED AND UNGROUPED FORMS 


UNGROUPED GROUPED 
Xx Y 5 X Y 
-1.22 0 group=3 
-1.09 0 i -1.09 0 
- .66 0 Z - .46 i 
=. 36 0 5 - .19 i 
- .46 0 + 50) dL 2 
- .36 al 3. a 2 
= OL 0 6 oo Ze 
- .18 1 i 43H) 2 
- .17 0 8 oS 2 
aes io i group=4 
Ol IL a -.38 0 
ce 0 2 -.34 Z 
ees i 3 -.03 eZ 
6 0 “ 220 2 
24 1 5 ~47 + 
a2 6 0 6 ~94 2 
256 ie group=5 
. 46 i i -.66 0 
~48 Je Z eg 3 
520, 1 5 wt 3 
none 0 + - 46 4 
~94 i 5 ~94 Ze 
OL. 
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TABLE 45 


DATA SET FL, LOGGED JANUARY ANOMALIES AND 
SUCCESSES FOR GROUPED AND UNGROUPED DATA 


K 


UNGROUPED GROUPED 

t yi Me 5 x 

al -3.41 a group=3 

2 -1.82 0 a =o 

5 -1.14 0 2 =1°370,0 

4 -1.02 0 e) - .46 

5 -1.00 0 4 - .20 

6 = 4 0 5 gisi 

u =a 4] 0 6 -14 

8 - .46 0 7 Za) 

2 - .39 0 8 34 
10 - .28 1 9 se 
tak = 6 418 0 10 Sore 
1 - .14 1 dei: Bet) 
3 - .04 il a2 1.14 
14 ei 1 Q group=4 
ils deal 0 I -1.48 
IS Pyles: 0 Zz - ./70 
7 14 0 3 - .24 
18 6 1 4 aula: 
9 wu 0 5 26 
20 2 0 6 ere 
ail <2 5 0 7 530 
a2 +30 1 8 Payal 
23 . 34 IL 2 beg ilk 
24 236 0 group=5 
ie - 46 IL 1 -1.14 
26 730 0 2 - .46 
27 Fay ay 3 - .04 
28 5) 0 4 wG 
eo 5 (5 ak iL is) 34 
30 Boy al i 6 ol 
cal or 1 7 ASS 
SZ iD Ik 
25 Oe 1 
34 40S ik 
3S) 1.14 iL 
36 ieee 1 
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TABLE 46 


DATA SET SC, LOGGED JANUARY ANOMALIES AND 
SUCCESSES FOR GROUPED AND UNGROUPED FORMS 


UNGROUPED 
Xx 

-4.24 

-1.41 

-1.38 
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TABLE 47a 


ORDINARY LEAST SQUARES REGRESSION WITH 
THE MODEL OF EQUATION VI.7 FOR DATA SET RN 


R-squared = .0423 


Standard error of estimate = 6.1508 RN 
GROUP=3 
AOQV 
SOURCE DF SS MS F 
motal 46 Was; & ISIE 
Regression aL 15. O10 1 US eleits) 7 2100 
Jan-Control IL Ta2007 72.00 / 2.00 
Residual 45 1702.444 37.332 
Variable Coefficient Standard error At 
Alpha ILG Se Bie) 5) ILE Fe 8.69 
Beta eA35 5 a SOs eae 
90% Confidence Limits 
Lower limit Upper limit 
Alpha ii cron 0) 17.1988 
Beta - .0446 Ses 7 
R-squared = .4068 RN 
Standard error of estimate = 1.2099 GROUP=4 
AOV 
SOURCE DF Ss MS igh 
Total 5 9.871 
Regression a 4.016 4.016 2.74 
Jan-Control iL 4.016 4.016 gee 4 
Residual 4 Syst aye 1.464 
Variable Coefficient Standard error AU 
Alpha -. 1696 oe, -.33 
Beta oe 7 1.068 1.66 
90% Confidence Limits 
Lower limit Upper limit 
Alpha -~1.1675 . 8284 
Beta 1.7688 Six 03 2 tele: 
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R-squared = .5973 


Standard error of estimate = .9995 
AOV 
SOURCE DF Ss MS 
motal 4 7.442 
Regression alk 4.446 4.446 
Jan-Control ih 4.446 4.446 
Residual 3 2.997 .999 
Variable Coefficient Standard error 
Alpha . 461 
Beta ous 
90% Confidence Limits 
Lower limit Upper limit 
Alpha ~1.2358 . 7064 
Beta 3.4345 


TABLE 47b 


GROUP=5 


ORDINARY LEAST SQUARES REGRESSION WITH 
THE MODEL OF EQUATION VI.7 FOR DATA SET FL 


R-squared = .2862 


Standard error of estimate = .8949 
AOV 

SOURCE Drs Ss MS 
Motal deak 20.591 
Regression aL Pesos We Shs 
Jan-Control alt 7.998 7.998 
Residual 10 1T22593 1259 
Variable Coefficient Standard error 

Alpha -.1459 4 

Beta 20s 2 ae, 


90% Confidence Limits 


Lower limit 
Alpha -.6871 
Beta N35 90 


Upper limit 
~4944 
e749 2 


ILS) 


lab; 
GROUP=3 


De Dy 





R-sgquared = .5370 


Standard error of estimate = 1.0484 
AOV 
SOURCE DF Ss MS 
Total 8 iGo. 615 
Regression i 8G BAe S22 
Jan-Control 1 8.922 Sa922 
Residual 7 7.963 1.099 
Variable Coefficient StandarLa Breor 
Alpha ~.1148 £350 
Beta Meso 57 2 ~476 
90% Confidence Limits 
Lower limit Upper limit 
Alpha -.7237 ~4941 
Beta 5297 2.1847 


R-squared = .6311 


Standard error of estimate = .8809 
AOV 

SOURCE DF SS MS 
Total 6 O25 29 
Regression i 6.639 Sees Sie) 
Jan-Control iL 6.639 6.639 
Residual 5 3.880 2116 
Variable Coefficient Standard Error 

Alpha -.1393 ore 4 

Beta S225 eo 

90% Confidence Limits 
Lower limit Upper limit 
Alpha -.7525 may 20 
Beta ese TiS Boa} 


a6 


EG 
GROUP=4 


1a fF 
GROUP=5 





TAB Ea C 


ORDINARY LEAST SQUARES REGRESSION WITH 


Peer OL BOUATEON Vi.7 FOR DATA SET SC 


— 2 


R-squared = .1404 


Standard error of estimate -—- .990L1 
AOV 

SOURCE DF Ss MS 
Total ake 15.965 
Regression 1 Z2e241 Da a 
Jan-Control IL Pa Pa ML 2.241 
Residual 14 oa 24 .980 
Variable Coefficient Standard Error 

Alpha -.3039 ~249 

Beta Oo 2355 

90% Confidence Limits 
Lower limit Upper limit 

Alpha -. 7090 Od 2 

Beta -.0386 1.0504 
R-squared = .1872 
Standard error of estimate = .8959 

AOV 

SOURCE DF SS MS 
Total slg 9875 
Regression iL 1.848 1845 
Jan-Control 1 peste es: 1.848 
Residual a0, 8.026 . 803 
Variable Coefficient Standard Error 

Alpha -.2284 260 

Beta 9447 . 359 

90% Confidence Limits 
Lower limit Upper limit 
Alpha -.6618 «2051 
Beta noa4a 7 i439 


ILS 7 


16 
GROUP=3 


aL 
S22 
enon! 


SC 


GROUP=4 





R=squared = .2862 sie 


Standard error of estimate = .8949 GROUP=5 
ADV 
SOURCE DF SS MS F 
total 9 8.976 
Regression i Zoo o) Ze Gis Serge 
Jan-Control Ji 2.569 2.569 Sea ik 
Residual 8 6.408 .-801 
Variable Coefficient Standard Error au 
Alpha -.1838 ee -.64 
Beta 650.3 ~ Sey ISAS 
90% Confidence Limits 
Lower limit Weper liana t 
Alpha -.6736 7306 
Beta S050) S 1g Ass 
TABLE 48 


ITERATIVELY REWEIGHTED LEAST SQUARES 
REGRESSION USING BI-WEIGHTS FOR THE MODEL OF EQUATION VI./7 


GROUP SIZE 3 


c=9 Ot 8 
Data sets RN 048 ieOws 
FL -.063 .664 _ 
SC -.116 Rolo le 
Cc=4 
Data sets RN O52 iNSaN@) SHE 
FL -.197 1.049 
Se ~154 - 480 
GROUP SIZE 4 
c=9 
Data sets RN -.103 =OO5 
FL -.093 S705 
SC -.169 HZ 60 
GROUP SIZE 5 
C=9 
Data sets RN -.065 - 865 
FL -.107 e723 
SC -.126 oles 
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2. Maximum Likelihood 

Table 49 displays the final points for each data set 
along with the inverse Hessian at that point. Figures 103 
(RN), 104 (FL), and 105 (SC) are interesting in that they 
portray the contours of the likelihood functions for each 
data set. These contours show the variance of the estimated 
parameters in a graphic way. Note how data sets RN and FL 
seem to have some sort of horizontal ridge indicating a 
good pick of the slope parameter, yet the contours of data 
set SC are almost circular about the origin of the axes 
indicating no significant difference from zero for either 
merameter. 

TABLE 49 


MAXIMUM LIKELIHOOD ESTIMATES OF o AND 8 ALONG WITH 
ESTIMATES OF THEIR VARIANCE FOR ALL THREE DATA SETS 


Data set RN 


a= .062 1 Bods) =A 4 3 
H, = : 
8 = 2.918 L 5 One ie 72.0 
V{a] = .257 
v[B] = 1.720 
m Data set FL 
a= --.1L7L 129 ~.040 
‘yt = 
= Roos L -.040 .298 
v[a] = .129 
V[g] = .298 
m Data set SC 
a= -.303 .088 -.035 
aot lf 
== .171 L Se Os5 eibieg 
v[a] = .088 
vV[s8] = .113 
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Figure 103. Contours of LOG ule lt hood 
function for data set RN 
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Figure 104. Contours of log likelihood 
function for data set FIL 
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eee PLSCUSSION 


A recapitulation of all the parameter values is in 


Table 50. Some interesting observations to be made from- 


this table are: 


(1) The slope for data set RN, as found by maximum 


likelihood, is much greater than that of any other method 


or data set. The first temptation is to treat this as an 


outlier, yet the evidence of the contour plot and of the 
validation of the next section tend to back up this number. 


The reason for this difference 1s a possible subject for 


further reserach. 


(2) Except for the intercept values, the slopes of data 


sets RN and FL seem to be fairly consistent within and between 





regression methods. This comment is made in light of the 
difference of these data sets and that of SC. 

(3) Data sets RN and FL seem to be similar in many ways, 
yet data set SC appears to be different in both degree and 
Sr1onificance. 

No other strong pattern 1S apparent in these parameter 
values. Graphical displays of the parameters, as used with 
grouped data are given in Figures 106 (RN), 107 (FL), and 
108 (SC). Again, note the significant difference of the 
maximum likelihood line for data set RN. Table 51 contains 
the data points from which Figures 106, 107 and 108 were 
drawn. 

After viewing these figures, the maximum likelihood 
approach is the preferred method for the Peninsula data sets, 
whereas the robust(C=4) IRWLS may be best for the Valley data 
set. Although fits were made to data set SC, it appears as 


if no great significance has been found. 
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Method 


Jee 


Maximum 
Likelihood 


OLS 
Group=3 
Group=4 


Group=5 


IRWLS 
Group=3 
C=9 


Group=3 
Cc=4 


Group 4 
C=9 


Group=5 
C=9 


TABLE 50 


PARAMETER FIT RECAPITULATION 


FOR ALL DATA SETS 


RN 
Ges S02 
Bo 29 8 
@ 2 - .195 
Bs: ee 
a 3 - .170 
Sis 1.769 
a: - .265 
Bare 719 
a: PoO4s 
pao. O13 
a: S10 sf 
Bes Js (Gal 
Chie -.103 
Brae (las 
Gs -.065 
Denes Poo 
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Figure 106. Estimated probability of greater-than- 
average total rest-of-year rainfall versus the 
anomaly of logged rainfall for January for data set RN 
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Figure 107. Estimated probability of greater- 
than-average total rest-of-year rainfall versus 
the anomaly of logged rainfall for January for 
data set FL 
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Figure 108. Estimated probability of greater-than 
average total rest-of-year rainfall versus the 
anomaly of logged rainfall for January for data 
set FL 
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ACTUAL VALUES 


LOGGED ACTUAL 
ANOMALY VALUE 
= 5 ONY 0.000 
- .460 330 
- .190 #3380 
.010 As) 168 
ro 0 pow0 
7500 wo 0 
7200 513) Fe 
5 she N20. 

ACTUAL VALUES 

LOGGED ACTUAL 
ANOMALY VALUE 
pl. 820 rere’, 
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- .460 0.000 
= 748 6:70 
5 IG. 2350 
140 750 
2.00 0.000 
. 340 51S 16, 
700 = (5) 7/16, 
.610 20.0 
2/20 15E 0100, 
E140 iO 0 


TABLE 5la 


PORSMODEL FITS OF RN 
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~042 22o0 
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WA: VALIDATION OF LOGISTIC MODELS 


A. GENERAL 

The various parameters that were estimated in the previous 
section all may be subject to some sort of validation. How- 
ever, this paper will only view the validation for the maximum 
likelihood approach on all data sets and the IRWLS (C=4) 
approach on data set SC. The validation will be conducted 
against the reserved, independent, data sets of years 1975 
through 1980. These are the same data sets as used in section 
ay . 

Table 52 portrays the reserved data, ina form for logistic 
analysis, and Figure 109 is a display of the derived contin- 


gency tables for the reserved data. 
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TAEEE. 2 


RESERVED DATA IN FORM FOR THE 
LOGISTIC ANALYSIS 


DATA SET RN 


YEAR Xx COMP LEMENT 
VS hg es: Sy SS, BZ 2 
1376 -2.446 ae 2.3 
dS BF -.477 11.14 
O73 A iatese een 2 
SAS) TES, ~542 Sigs eae ie 
ILENE, oe --= 
DATA Toe FL 
eS. -.470 Zi DS 
976 =2.610 Ika) es Sys. 
OV 7 -.470 et 0 
od 8 ~734 SLES Si 
O79 ~547 Sethe 
TERI ,ooe == = 
DATA SET SC 
ILS gs. 45) SS ee 7 
JES IRAS =S5 90S ee 
IES) 7 Oe Ores 
Ie eyes: 1.054 Zoe 
ESTES) owe Ibe 5 Sie 
1980 ~ 405 === 
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Data set RN 


Complement 
Control EG | 0 | 3 
+ O | 2 | 3 
3 [2 | 5 


Data set FL 


Complement 
- oa 
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Complement 
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Ss A ee 
Gon exo | 
2 2) Ae ie 
Sie 2. 
Figure 109. 2x2 contingency Tables of reserved data 


controlled by the anomaly of January rainfall. The 
complement is the anomaly of the rest-of-year rainfall. 
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pee ReoULTS 
1. Data set RN 
The model proposed by the maximum likelihood 


parameters is 


WO eos 9 LS 3X 
e ie 


a Vane 
2 Ve @° 961842.9183X, 


This model, when applied to the reserved data yields Table 53. 


TABLE 53 


RESULTS OF LOGSTIC VALIDATION 
ON DATA SET RN 


YEAR Xx x 0 

Ios =e 9 0 za 

1976 -~2.746 0 -1a)(8 

1977 -.477 0 Ea 

1978 4 els i oS 

IFS, sae 1 . 84 

1980 aoe - sil ie 
The 4@ 1s interpreted, again, as: The conditional 


ic 
probability that the complement, the total rainfall for 


February through December, will be above its mean value, 


given that the logged January anomaly was xX GS te 


be 
appears that this model tends to predict the direction of 

the complements deviation well. Figure 110 is a plot of the 
estimated probabilities against the actual complement anomaly. 


For an acceptable fit, this plot should show an upward to the 


right slope, which it does. 
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GUE weSrSEEMENT ANGMEL Y 


hice  LlO. PIGE OF OF versus complement 


anomalies for data set RN 


meee Data Set FL 
This data set is quite similar to the RN data, 
except that the slope parameter is only a third of that 
of RN. The model is 


~-.,171+.9325xX 
= ca 


ee eg 7 b71+.9325x, 
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TABLE 54 


RESULTS OF LOGISTIC VALIDATION 
ON DATA SET FL 


Year X Ne s) 

£975 -.470 0 Ns 
1976 -2.610 0 Oy 
eo -.470 0 55 
1978 a 4 i 63 
1979 mon 0 Ras) ;e: 
1980 woo 5 - .60 


A plot of the probabilities against the complement anomalies 


is in Figure lll. 


! : . 
! Vasa home i Oils Tic 


| 


_ 


SUEEES 


OF 


PROBABILTY 





3.600 | ; 
a>- G0 9.0230 


ACTUAL COMPLEMENT ANOMALY 


Pein elut. Plot of Oo. versus complement 


anomalies for data set FL 


This fit is not as good as that for data set RN. 
Tne Outlier, or false prediction of 1979 may not, however, 
be far out of line. The sparsity of points for which the 


complement anomaly was positive detracts from the validation 
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S. Bata Set SC 


The maximum likelihood model is 


-,3025+.171X 
3 Se Wigieg 
ee 1+ a7 3025+. 171K, 
and the IRWLS model is 
(2) _ Q 1 b537+.4799XK, VII. 4 
© eg tos7+.4799XK, 


JL 55 


and the tabular results are in Table 55. 


TABLE 155 


RESULTS OF VALIDATION ON DATA SET SC 


(ar) (2) 
Year X vi OY O 
1975 -.535 It ~20 aly 
1976 -3.773 0 aco 2G 
JESSIE, Reais)” 0 ao4 E56 
1978 1.054 1k eetas -66 
1979 peelee Q 4 aS - 60 
1980 405 - 44 sy) - 


and the plot of the probabilities versus the complement 


anomalies 1S in Figure 112. 
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Pagire LlZ2Z. Plot of OL versus complement 


anomalies for maximum likelihood and IRWLS 
parameters for data set SC 


See to CUSSION 


The validation of the maximum likelihood models for data 
sets RN and FL appears to be acceptable. However, data set 
SC does not appear to be acceptably modeled. In fact, as 
Figure 112 shows, the complement appears to be almost inde- 
pendent of the control. This is also shown by Figure 105 
where it can be seen that the contours are very flat and 
circular about the origin of the (a,8) coordinate system. 

The one apparent outlier of data set FL may be viewed 


as very close, therefore that model can also be assumed 


to be validated. 
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A. SUMMER MONTHS 

The further investigation of the summer months, to 
parallel the modeling of the winter months, yielded some 
interesting results. These results are shown here with 
no attempt at analysis. 

The summer months appear to be increasing in total 
rainfall and in variance. This is more true for the 
Peninsula data sets than for the Valley data set. Figures 
113 (RN), 114 (FL), and 115 (SC) show the by-month series 
of summer months. The total summer rainfall series by 
year are shown in Figures 116 (RN), 117 (FL), and 118 (SC). 
The reserved data are not included, yet it can be shown to 


continue the indicated trends. 
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Figure 113. Monthly plot of summer months 
only, means removed, for data set RN 
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Figure 114. Monthly plot of summer months 
only, means removed, for data set FL 
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Figure 115. Monthly plot of summer months 
only, means removed, for data set SC 
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Figure 116. Yearly plot of total summer 
Gatien fOr data set RN 
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Figure 117. Yearly plot of summer month 
rainfall for data set FL 
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Figure 118. Yearly plot of total summer 
rainfall for data set SC 
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Al SIGNIFICANCE OF JANUARY 

The identification of January as a possible predictor 
for its eleven month complement raises further questions. 
One of the questions is in determining which part of the 
complement lends the most towards its predictability. 
Figure 119 is a plot of the log-odds and chi-square statis- 
tics for the cumulative complements. Each progressive 
column, to the right, of Figure 119 indicates these statis- 
tics for another cumulated month, i.e., the first column 
compares the anomalies of January and February by itself, 
the second columm.isa comparison of January to February 
plus March, and so on until the last column is a comparison 
of January to the entire eleven month complement. 

Several occurrences to be noted from the figure are: 

(1) The log-odds are consistently greater than zero. 

(2) The lack of increased odds and significance during 
the summer months. - 

(3) The similarity of RN to FL and their combined 
difference to SC in the fall. 

These indications suggested a further look at January 
versus the fall months only. This analysis is displayed 
in Figure 120. The vertical scales of Figure 119 and 
Bmugube 120 are the same, yet the horizontal scales differ. 
This figure has five major divisions. The left-most 
division looks at January versus singular months in the 


fall. The second division looks at January versus pairs 
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Seemonths in the fall, and so on until the right-most 
column, which is January versus the total fall rainfall. 
This figure yields no apparent significance, and unstable 
odds. 

The combined information of Figures 119 and 120 are 
mildly confusing. One possible explanation may be that 
the summer months somehow cumulate significance and devia- 
mmeomrarrection, in order to allow the fall contribution. 


This possible synergistic affect should be explored further. 
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Figure 119. lLog-odds and significance versus 
additional months cumulated through the year 
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IX. SUMMARY 


Mae analysis of rainfall data is carried out ina 
comprehensive way. The autoregressive Markovian model 
of the early sections could not stand up to validation, 
but it did point to some sort of dichotomy between the 
seasons. 

2x2 contingency analysis was effective in that it 
brought attention to the predictive ability of January. 
This identification of January, when followed by the 
logistic analysis was seen to be successful in two of 
the three data sets. Thus, the primary conclusion of this 
thesis is the predictive ability of January rainfall. 

The physical reasoning behind this finding must be 
left to the meteorologist. Further study of the approach 
used here may lead to improvement in seasonal or annual 


rainfall forecasts for certain climatic regions. 
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